The next phase of hardware came, not from 3Dfx, but from a new company, NVIDIA. While 3Dfx's Voodoo II was much more popular than NVIDIA's product, the NVIDIA Riva TNT (released in 1998) was more interesting in terms of what it brought to the table for programmers. Voodoo II was purely a performance improvement; TNT was the next step in the evolution of graphics hardware.
Like other graphics cards of the day, the TNT hardware had no vertex processing. Vertex data was in clip-space, as normal, so the CPU had to do all of the transformation and lighting. Where the TNT shone was in its fragment processing. The power of the TNT is in it's name; TNT stands for TwiN Texel. It could access from two textures at once. And while the Voodoo II could do that as well, the TNT had much more flexibility to its fragment processing pipeline.
In order to accomidate two textures, the vertex input was expanded. Two textures meant two texture coordinates, since each texture coordinate was directly bound to a particular texture. While they were allowing two of things, NVIDIA also allowed for two per-vertex colors. The idea here has to do with lighting equations.
For regular diffuse lighting, the CPU-computed color would simply be dot(N, L), possibly with attenuation applied. Indeed, it could be any complicated diffuse lighting function, since it was all on the CPU. This diffuse light intensity would be multiplied by the texture, which represented the diffuse absorption of the surface at that point.
This becomes less useful if you want to add a specular term. The specular absorption and diffuse absorption are not necessarily the same, after all. And while you may not need to have a specular texture, you do not want to add the specular component to the diffuse component before you multiply by their respective colors. You want to do the addition afterwards.
This is simply not possible if you have only one per-vertex color. But it becomes possible if you have two. One color is the diffuse lighting value. The other color is the specular component. We multiply the first color by the diffuse color from the texture, then add the second color as the specular reflectance.
Which brings us nicely to fragment processing. The TNT's fragment processor had 5 inputs: 2 colors sampled from textures, 2 colors interpolated from vertices, and a single “constant” color. The latter, in modern parlance, is the equivalent of a shader uniform value.
That's a lot of potential inputs. The solution NVIDIA came up with to produce a final color was a bit of fixed functionality that we will call the texture environment. It is directly analogous to the OpenGL 1.1 fixed-function pipeline, but with extensions for multiple textures and some TNT-specific features.
The idea is that each texture has an environment. The environment is a specific math function, such as addition, subtraction, multiplication, and linear interpolation. The operands to this function could be taken from any of the fragment inputs, as well as a constant zero color value.
It can also use the result from the previous environment as one of its arguments. Textures and environments are numbered, from zero to one (two textures, two environments). The first one executes, followed by the second.
If you look at it from a hardware perspective, what you have is a two-opcode assembly language. The available registers for the language are two vertex colors, a single uniform color, two texture colors, and a zero register. There is also a single temporary register to hold the output from the first opcode.
Graphics programmers, by this point, had gotten used to multipass-based algorithms. After all, until TNT, that was the only way to apply multiple textures to a single surface. And even with TNT, it had a pretty confining limit of two textures and two opcodes.
This was powerful, but quite limited. Two opcodes really was not enough.
The TNT cards also provided something else: 32-bit framebuffers and depth buffers. While the Voodoo cards used high-precision math internally, they still wrote to 16-bit framebuffers, using a technique called dithering to make them look like higher precision. But dithering was nothing compared to actual high precision framebuffers. And it did nothing for the depth buffer artifacts that a 16-bit depth buffer gave you.
While the original TNT could do 32-bit, it lacked the memory and overall performance to really show it off. That had to wait for the TNT2. Combined with product delays and some poor strategic moves by 3Dfx, NVIDIA became one of the dominant players in the consumer PC graphics card market. And that was cemented by their next card, which had real power behind it.